在包括搜索在内的各种应用程序中,积极消费数字文档的研究范围为研究范围。传统上,文档中的搜索是作为文本匹配的问题施放的,忽略了结构化文档,表格等中常见的丰富布局和视觉提示。为此,我们提出了一个大多数未探索的问题:“我们可以搜索其他类似的snippets在目标文档页面中存在给定文档摘要的单个查询实例吗?”。我们建议单体将其作为单拍的摘要检测任务解决。单体融合了摘要和文档的视觉,文本和空间方式的上下文,以在目标文档中找到查询片段。我们进行了广泛的消融和实验,显示单体从一击对象检测(BHRL),模板匹配和文档理解(Layoutlmv3)中优于几个基线。由于目前的任务缺乏相关数据,因此我们对单体进行了编程生成的数据训练,该数据具有许多视觉上相似的查询片段和来自两个数据集的目标文档对 - Flamingo表单和PublayNet。我们还进行人类研究以验证生成的数据。
translated by 谷歌翻译
In this paper, we consider the problem of path finding for a set of homogeneous and autonomous agents navigating a previously unknown stochastic environment. In our problem setting, each agent attempts to maximize a given utility function while respecting safety properties. Our solution is based on ideas from evolutionary game theory, namely replicating policies that perform well and diminishing ones that do not. We do a comprehensive comparison with related multiagent planning methods, and show that our technique beats state of the art RL algorithms in minimizing path length by nearly 30% in large spaces. We show that our algorithm is computationally faster than deep RL methods by at least an order of magnitude. We also show that it scales better with an increase in the number of agents as compared to other methods, path planning methods in particular. Lastly, we empirically prove that the policies that we learn are evolutionarily stable and thus impervious to invasion by any other policy.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
Recent developments in quantum computing and machine learning have propelled the interdisciplinary study of quantum machine learning. Sequential modeling is an important task with high scientific and commercial value. Existing VQC or QNN-based methods require significant computational resources to perform the gradient-based optimization of a larger number of quantum circuit parameters. The major drawback is that such quantum gradient calculation requires a large amount of circuit evaluation, posing challenges in current near-term quantum hardware and simulation software. In this work, we approach sequential modeling by applying a reservoir computing (RC) framework to quantum recurrent neural networks (QRNN-RC) that are based on classical RNN, LSTM and GRU. The main idea to this RC approach is that the QRNN with randomly initialized weights is treated as a dynamical system and only the final classical linear layer is trained. Our numerical simulations show that the QRNN-RC can reach results comparable to fully trained QRNN models for several function approximation and time series prediction tasks. Since the QRNN training complexity is significantly reduced, the proposed model trains notably faster. In this work we also compare to corresponding classical RNN-based RC implementations and show that the quantum version learns faster by requiring fewer training epochs in most cases. Our results demonstrate a new possibility to utilize quantum neural network for sequential modeling with greater quantum hardware efficiency, an important design consideration for noisy intermediate-scale quantum (NISQ) computers.
translated by 谷歌翻译
音频文本检索需要自然语言查询以在数据库中检索相关的音频文件。相反,文本审计检索将音频文件作为查询来检索相关的自然语言描述。大多数带有一个音频字幕数据集的文献训练检索系统,但是评估多个数据集培训的好处是没有充满反感的。此外,检索系统必须学习描述从几秒钟到几秒钟的可变长度的音频内容之间的详细句子之间的对齐。在这项工作中,我们提出了一个新的Web音频文本对以及一个新的检索框架。首先,我们提供了大约五千个Web音频纹理对的新集合,我们称为WavText5k。当用来训练我们的检索系统时,WavText5K比其他音频字幕更多地提高了性能。其次,我们的框架学会了使用文本编码器,两个音频编码器和对比度学习目标来连接语言和音频内容。组合两个音频编码器有助于处理可变长度音频。这两个贡献超过了AudioCaps和Clote的Text-Audio检索的最新表现,相对2%和16%,而音频检索则达到6%和23%。
translated by 谷歌翻译
随着计算机视觉技术的进步,根据其功能对图像进行分类的需求已成为一项巨大的任务和必要性。在此项目中,我们提出了2种模型,即使用ORB和SVM的特征提取和分类,第二个是使用CNN体系结构。该项目的最终结果是了解特征提取和图像分类背后的概念。训练有素的CNN模型还将用于将其转换为用于Android开发的TFLITE格式。
translated by 谷歌翻译
在这个项目中,我们提出了一个CNN架构来检测异常和可疑活动。为该项目选择的活动正在公共场所开展,跳跃和踢球,并在公共场所携带枪支,蝙蝠和刀。通过训练有素的模型,我们将其与Yolo,VGG16,VGG19等先前的模型进行了比较。然后实现训练有素的模型进行实时检测,并使用。训练有素的.H5模型的TFLITE格式以构建Android分类。
translated by 谷歌翻译
拟议的购物助理模型SANIP将帮助盲人检测手持有的物体,并从检测到的对象中获取信息的视频反馈。提出的模型由三个Python模型组成,即自定义对象检测,文本检测和条形码检测。为了检测手持对象,我们创建了自己的自定义数据集,该数据集包括Parle-G,Tide和Lays等日常商品。除此之外,我们还收集了购物车和出口标志的图像,因为对于任何人来说,使用购物车都至关重要,并且在紧急情况下还要注意出口标志。对于其他2个模型,提出的文本和条形码信息将从文本转换为语音,并传达给盲人。该模型用于检测经过训练并成功地检测和识别所需输出的对象,其精度和精确度良好。
translated by 谷歌翻译
意图检测是对话助手的任何自然语言理解(NLU)系统的关键部分。对于存在多个指令和意图的电子邮件对话,检测正确的意图是必不可少的,但很难。在这种设置中,对话上下文可以成为检测助手的用户请求的关键歧义因素。合并上下文的一种突出方法是建模过去的对话历史,例如以任务为导向的对话模型。但是,电子邮件对话的性质(长形式)限制了直接使用面向任务的对话模型中最新进展。因此,在本文中,我们提供了一个有效的转移学习框架(EMTOD),该框架允许对话模型中的最新开发方式用于长形式的对话。我们表明,提出的EMTOD框架将预训练的语言模型的意图检测性能提高了45%,而预先培训的对话模型则提高了30%,以实现任务为导向的电子邮件对话。此外,提出的框架的模块化性质允许在预训练的语言和面向任务的对话模型中为未来的任何发展提供插件。
translated by 谷歌翻译
我们提出了一种名为ACLNET的新型深度学习模型,用于从地面图像中分割云。ACLNET同时使用深神经网络和机器学习(ML)算法来提取互补功能。具体而言,它使用有效网络-B0作为骨干,“``trous tos blacial pyramid boming''(ASPP)在多个接受场上学习,并从图像中提取细节细节。ACLNET还使用K-均值聚类来更精确地提取云边界。ACLNET对白天和夜间图像都有效。它提供的错误率较低,较高的召回率和更高的F1得分比Art最先进的云分割模型。ACLNET的源代码可在此处获得:https://github.com/ckmvigil/aclnet。
translated by 谷歌翻译